63 research outputs found
Population-level Balance in Signed Networks
Statistical network models are useful for understanding the underlying
formation mechanism and characteristics of complex networks. However,
statistical models for \textit{signed networks} have been largely unexplored.
In signed networks, there exist both positive (e.g., like, trust) and negative
(e.g., dislike, distrust) edges, which are commonly seen in real-world
scenarios. The positive and negative edges in signed networks lead to unique
structural patterns, which pose challenges for statistical modeling. In this
paper, we introduce a statistically principled latent space approach for
modeling signed networks and accommodating the well-known \textit{balance
theory}, i.e., ``the enemy of my enemy is my friend'' and ``the friend of my
friend is my friend''. The proposed approach treats both edges and their signs
as random variables, and characterizes the balance theory with a novel and
natural notion of population-level balance. This approach guides us towards
building a class of balanced inner-product models, and towards developing
scalable algorithms via projected gradient descent to estimate the latent
variables. We also establish non-asymptotic error rates for the estimates,
which are further verified through simulation studies. In addition, we apply
the proposed approach to an international relation network, which provides an
informative and interpretable model-based visualization of countries during
World War II
Semi‐supervised joint learning for longitudinal clinical events classification using neural network models
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/163377/2/sta4305.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/163377/1/sta4305_am.pd
The fermentation optimization for alkaline protease production by Bacillus subtilis BS-QR-052
IntroductionProteases exhibit a wide range of applications, and among them, alkaline proteases have become a prominent area of research due to their stability in highly alkaline environments. To optimize the production yield and activity of alkaline proteases, researchers are continuously exploring different fermentation conditions and culture medium components.MethodsIn this paper, the fermentation conditions of the alkaline protease (EC 3.4.21.14) production by Bacillus subtilis BS-QR-052 were optimized, and the effect of different nutrition and fermentation conditions was investigated. Based on the single-variable experiments, the Plackett–Burman design was used to explore the significant factors, and then the optimized fermentation conditions, as well as the interaction between these factors, were evaluated by response surface methodology through the Box–Behnken design.Results and discussionThe results showed that 1.03% corn syrup powder, 0.05% MgSO4, 8.02% inoculation volume, 1:1.22 vvm airflow rate, as well as 0.5% corn starch, 0.05% MnSO4, 180 rpm agitation speed, 36°C fermentation temperature, 8.0 initial pH and 96 h incubation time were predicted to be the optimal fermentation conditions. The alkaline protease enzyme activity was estimated to be approximately 1787.91 U/mL, whereas subsequent experimental validation confirmed it reached 1780.03 U/mL, while that of 500 L scale-up fermentation reached 1798.33 U/mL. This study optimized the fermentation conditions for alkaline protease production by B. subtilis through systematic experimental design and data analysis, and the activity of the alkaline protease increased to 300.72% of its original level. The established model for predicting alkaline protease activity was validated, achieving significantly higher levels of enzymatic activity. The findings provide valuable references for further enhancing the yield and activity of alkaline protease, thereby holding substantial practical significance and economic benefits for industrial applications
Metastatic patterns and prognosis of patients with primary malignant cardiac tumor
BackgroundDistant metastases are independent negative prognostic factors for patients with primary malignant cardiac tumors (PMCT). This study aims to further investigate metastatic patterns and their prognostic effects in patients with PMCT.Materials and methodsThis multicenter retrospective study included 218 patients with PMCT diagnosed between 2010 and 2017 from Surveillance, Epidemiology, and End Results (SEER) database. Logistic regression was utilized to identify metastatic risk factors. A Chi-square test was performed to assess the metastatic rate. Kaplan–Meier methods and Cox regression analysis were used to analyze the prognostic effects of metastatic patterns.ResultsSarcoma (p = 0.002) and tumor size¿4 cm (p = 0.006) were independent risk factors of distant metastases in patients with PMCT. Single lung metastasis (about 34%) was the most common of all metastatic patterns, and lung metastases occurred more frequently (17.9%) than bone, liver, and brain. Brain metastases had worst overall survival (OS) and cancer-specific survival (CSS) among other metastases, like lung, bone, liver, and brain (OS: HR = 3.20, 95% CI: 1.02–10.00, p = 0.046; CSS: HR = 3.53, 95% CI: 1.09–11.47, p = 0.036).ConclusionPatients with PMCT who had sarcoma or a tumor larger than 4 cm had a higher risk of distant metastases. Lung was the most common metastatic site, and brain metastases had worst survival among others, such as lung, bone, liver, and brain. The results of this study provide insight for early detection, diagnosis, and treatment of distant metastases associated with PMCT
Statistical Learning for Large-Scale and Complex-Structured Data
Our modern era has seen an explosion in the amount of valuable information stored in large and complex datasets. The growing scale, diversity of data structures, and incomplete observations in these datasets pose new challenges for statistical learning. Motivated by these challenges, this dissertation addresses three important problems below.
(I) The first part of the dissertation presents how ordinary differential equations (ODE) can be novelly used to enhance modeling flexibility and computational efficiency in survival analysis for complex and incomplete censored data. Despite rich literature on survival analysis, most existing statistical models and estimation methods still suffer from practical limitations such as restricted model capacity and a lack of scalability for large-scale studies. We introduce a unified ODE framework for survival analysis that allows flexible modeling and enables a statistically efficient procedure for estimation and inference. In particular, the proposed estimation procedure is computationally efficient, easy-to-implement, and applicable to a wide range of survival models. Moreover, to accommodate data in diverse formats, we extend the ODE framework by leveraging deep neural networks for powerful prediction.
(II) The second part of the dissertation focuses on statistical models for signed networks. Statistical network models are useful for understanding the underlying formation mechanism and characteristics of complex networks. However, statistical models for signed networks have been largely unexplored. In signed networks, there exist both positive (e.g., like, trust) and negative (e.g., dislike, distrust) edges, which are commonly seen in real-world scenarios. The positive and negative edges in signed networks lead to unique structural patterns, which pose challenges for statistical modeling. In this part, we introduce a novel latent space approach for modeling signed networks and accommodating the well-known balance theory in social science, i.e., "the enemy of my enemy is my friend" and "the friend of my friend is my friend". The proposed approach treats both edges and their signs as random variables, and characterizes the balance theory with a novel and natural notion of population-level balance. This approach guides us towards building a class of balanced inner-product models, and towards developing scalable algorithms via projected gradient descent to estimate the latent variables. We also establish non-asymptotic error rates for the estimates.
(III) The third part of the dissertation focuses on applications of statistical machine learning to healthcare. In particular, quick and accurate prediction of disease progression can provide valuable information for clinicians to provide appropriate care in a timely manner. The success of prediction models often relies on the availability of a large number of labeled training data. However, in many healthcare settings, only a small minority of available data is accurately labeled while unlabeled data is abundant. Further, input variables such as clinical events in the medical records are usually of a complex, longitudinal nature, which poses additional challenges. Motivated by the scarcity of annotated data, we propose a new semi-supervised joint learning method for classifying clinical events data, which requires fewer labeled training data while maintaining the same prediction performance when compared to the supervised method.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/174618/1/weijtang_1.pd
KL-divergence Based Deep Learning for Discrete Time Model
Neural Network (Deep Learning) is a modern model in Artificial Intelligence
and it has been exploited in Survival Analysis. Although several improvements
have been shown by previous works, training an excellent deep learning model
requires a huge amount of data, which may not hold in practice. To address this
challenge, we develop a Kullback-Leibler-based (KL) deep learning procedure to
integrate external survival prediction models with newly collected
time-to-event data. Time-dependent KL discrimination information is utilized to
measure the discrepancy between the external and internal data. This is the
first work considering using prior information to deal with short data problem
in Survival Analysis for deep learning. Simulation and real data results show
that the proposed model achieves better performance and higher robustness
compared with previous works.Comment: This paper is not complete and the results are not qualified to be
public. Therefore we decided to withdraw the paper and plan to submit a newer
version in the futur
- …